Week 7
Milestones
- Experiments with CLIP
- draw confusion matrices comparing the errors on CLIP and Tesseract classification - result: CLIP is better
- Set up a git repository - shrivastava95/clip-ocr for fine-tuning CLIP onto a given dataset.
- Created a dataset of cropped word images from some pages of en-or.pdf
- Implemented base zero-shot approach
- Implemented CoOp - https://arxiv.org/abs/2109.01134 as a cheaper alternative to finetuning CLIP
Screenshots / Videos
Contributions
Learnings
- Learnt about OpenAI's CLIP model, a zero-shot model for measuring semantic similarity between image and text pairs.
- This is done using cosine similarity of their projections onto a common embedding space.